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MEMORANDUM FOR FILE 

LIL is a Little implementation Language for the PDP-11 family of computers. It is little in the 
important sense that there are very few constructs and combining rules, and fewer exceptions. It 
is an implementation language, which means that it is possible to express any of the code needed 
to implement an operating system, including I/O drivers, interrupt routines, and transfer tables. 
Nevertheless it remains a moderately high-level language, with the improved readability and reli¬ 
ability that one expects from avoiding assembly-language level coding. 

This document is a reference manual for LIL. It provides a comprehensive, and reasonably 
precise, description of the language, but makes no attempt to introduce concepts in a tutorial ord¬ 
er. 

1. Invocation 

The LIL compiler is invoked by 
1c args 

where args are source files if they end in .1, flags if they begin with —, and object files otherwise. 
If more than one source file is specified, or if the —c flag is present, object code for each file 
whose name ends in .1 is written on a file with a similar name, only ending in .o. 

Unless —c (compile only) or —P (preprocess only) is present, all object files are presented to 
the loader for binding, along with the standard C libraries. Profiling may be invoked with —p 
(see cc (I), prof (I) Id (I), monitor (III) in the UNIX manual). 

2. Preprocessor 

If the first character of a LIL source file is a #, a string-oriented preprocessor is invoked. 
This scans the source for lines beginning with a # and recognizes the following non-empty com¬ 
mand lines: 

# include 'pathname' — The file specified by pathname is included in the source in place of the 

command. Line numbering of the remainder of the source file is unaffected. For the 
sake of diagnostics, all lines in the included file have the same line number. 

# define SYMBOL string — The string of characters on the remainder of the line is remem¬ 

bered, along with the SYMBOL name, and the defining line is blanked out. All subse¬ 
quent occurrences of SYMBOL are replaced by string, surrounded by blanks. SYMBOL 
is a maximal string of letters, digits, dot V, and underline beginning with a non-digit. 
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The —P option on the invocation line may be used to cause preprocessor output for each 
file whose name ends in .1 to be written on a file with a similar name, only ending in .i. PL/l 
style comments are permitted on preprocessor command lines. 

3. Lexical analysis 

The source file is treated as a series of tokens, possibly separated by white space, which con¬ 
sists of one or more blanks, tabs, newlines, and/or comments. Comments are arbitrary strings 
beginning with a % and ending with the next newline. Except for its important function of 
separating tokens, white space is ignored, and so should be used freely to emphasize the logical 
structure of the code. 

Tokens are one of the following strings of characters: 
identifier — one or more letters, digits, V, and/or ‘J, beginning with a non-digit. Only the first 
eight characters are used to distinguish identifiers, 
octal number — a 0, followed by any number of digits. Only the low order sixteen bits of the 
value represented by the string are retained. 

decimal number — a non-zero digit, followed by any number of digits. If the value represented 
by the string is greater than 32767, its value as a token is uncertain, 
character string — a pair of single quotes enclosing any number of characters. Each character 
may be any valid ascii code except newline or single quote. A \ is combined with the 
character following to form a single character; the following mapping occurs: 


\0 becomes 

000 

NUL 

\a 

006 

ACK 

\e 

004 

EOT 

\n 

012 

NL 

\p 

033 

PRE (ESC) 

\r 

015 

CR 

\t 

Oil 

TAB 


All other characters following \ are mapped into themselves (including \, newline and 
single quote). 

operator — one or more of the characters from the set '=+—< >?*/&l!"*’. The pairs f and '! 
are each interchangeable. 

punctuation — A character from the set ‘;,{}II0Y. The pairs {( and )} are each interchangeable, 
but it is recommended that the stylistic usage followed in this description be retained 
wherever possible. The equivalences are provided to ease typing readable LlL programs 
on a 64-character terminal. 

Other ascii characters not shown here will produce a diagnostic message and be skipped. 

4. Keywords 

Certain identifiers are predefined as keywords, i.e. tokens with special syntactic meaning. 


These identifiers are: 

break 

goto 

rts 

byte 

if 

sizeof 

continue 

jsr 

sys 

do 

local 

while 

else 

mem 

word 

extern 

reg 



Keywords may not be redefined in local regions, unlike other identifiers. 
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It is possible, however, to remove an identifier from the keyword class, once and for all, by 
means of the escape \. Any occurrence of a keyword immediately preceded by a \ is treated as a 
reference to a hitherto undefined identifier of that name. This feature should only be used when 
absolutely necessary, as when interfacing to a foreign language routine, and then only near the 
end of a lil source file. 

With this understanding, the term identifier will be used hereafter to mean any non¬ 
keyword or a keyword that has been properly escaped. 

5. Locatives 

Identifiers, octal numbers, decimal numbers, and character strings no longer than two char¬ 
acters are the simplest forms of locatives, which are used to specify the location of operands for 
generating machine instructions, or which are the actual operands of compile-time operations. A 
locative has a value, which may be defined or undefined, absolute or relocatable with respect to 
some bias. Each locative also has a type, which corresponds closely to a valid PDP-11 addressing 
mode or specifies a condition, a size in bytes, and the optional attributes external and byte. Octal 
numbers, decimal numbers, and short character strings, for instance, are by default type immedi¬ 
ate, and have values which are defined and absolute. 

Unary operators are provided to alter type and other attributes. In addition, a number of 
predefined identifiers are provided from which type can be inherited. The basic types are: 
register — The value of the locative must be defined absolute between 0 and 7, inclusive. The 
operand is the contents of the general register specified by the value, 
memory — The value of the locative is taken as an address in memory at which the operand is 
to be found. The value may be absolute, to refer to special addresses used by the 
hardware, or relocatable, to refer to addresses within the user program. If the value is 
undefined, it is assumed that a definition will be provided later in the program text (for¬ 
ward reference) or at load time from another object module (undefined external), 
immediate — The value of the locative itself is to be the operand. Immediates may also be 
undefined or relocatable. 

More elaborate types are built from these primitives, or from expressions whose resultant 
type is register, memory, or immediate. Square brackets are used to indicate indexing or 
indirection; in conjunction with additional unary operators the remaining types are specified: 
autodecrement — The notation —(rexp), where rexp indicates a register locative or an expres¬ 
sion whose resultant type is register, specifies autodecrement addressing using the regis¬ 
ter determined by the value of the locative or expression, 
autoincrement — The notation Irexp|++ specifies autoincrement addressing in an analogous 
fashion. ++ is the only suffix operator defined in LIL. 
indexing — The notation xlrexpl, where x is a locative of type memory or immediate, specifies 
indexing on the register determined by rexp. The value of the locative is the same as x, 
while the index register used becomes part of the type information, 
indirection — The notation (loci, where loc is any locative that does not itself involve indirect 
addressing, specifies indirect addressing. By special dispensation, however, llrexpll is 
permitted; it is translated into KHrexpl). This notation results in a family of types: in¬ 
direct on register, indirect on memory, indirect on autodecrement, etc. 

The final type which locatives may assume is condition, which specifies what state of the 
condition code must obtain for a given test to succeed. Such locatives are always defined abso¬ 
lute, and have mystical values which closely resemble the appropriate op codes for conditional 
branches. Conditions usually arise from relational expressions or tests and seldom need be dealt 
with explicitly. 
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6. Predefined identifiers 

lil begins each compilation with a number of predefined identifiers which are often of use 
in writing code. 

All eight registers may be referenced by their conventional names: 

rO r4 

rl r5 

r2 sp 

r3 pc 

The symbol dot V is updated at the end of each statement to point to the next location at 
which code is to be generated. Similarly, setting dot to a new value causes loading to be diverted 
to the place specified. The location counters for the bss, text, and data sections (see ld(I) in the 
UNIX manual) are named .bss, .text, and .data, respectively. In addition, the absolute location 
counter .abs is provided for convenience, to remember the latest absolute value dot may have as¬ 
sumed. All of these locatives are of type memory. 

Conditions corresponding to testing just one of the condition code bits may be specified by 
using the predefined: 
carry 
minus 
oflow 
zero 

These correspond to the C, N, V, and Z bits, respectively. Unconditional tests may be written 
with 

false 

true 

in an obvious fashion. 

All of these predefined identifiers may be superseded by local definitions, making the origi¬ 
nal versions unreachable but still not interfering with their maintenance by the compiler. There 
is one additional predefined identifier, however, which serves as a flag; its latest edition is always 
consulted by LIL. This is .temp, which is used to specify whether or not the top element of the 
stack Ispl is to be used to hold an argument on function calls. If the standard function call nota¬ 
tion is used, therefore, this must be considered a reserved identifier at all times. 

7. Expressions 

Data manipulation instructions are generated by writing expressions involving locatives and 
binary infix operators: 

rO * x; % causes rO to be loaded from x 

rO + 1; % causes rO to be incremented 

rO = x + 1; % load then increment rO 

The interpretation of the last statement differs markedly from that used in most languages. All 
infix operators are of equal precedence in LIL, evaluation proceeds strictly left to right. The left 
operand is retained for use with each subsequent operator, permitting the shorthand shown in 
the example. (The third line is equivalent to the first two.) 

Parentheses may be used to modify this strictly left to right order of evaluation: 
rO = (x + 1); % increment x, then move x into rO 
The rule is: evaluate the left operand, performing any calculations inside parentheses or brackets, 
then do the same for the right operand, then perform the operation specified between the two 
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operands. The leftmost locative inside parentheses becomes the locative to use as operand. 

All unary operators bind tighter than binary operators, as does the specification of indexing 
or indirect addressing with square brackets. Individual unary and binary operators will be 
described in a later section. 

8. Compile time 

To perform arithmetic with symbols, without generating code, and to define identifiers, one 
writes what look like normal expressions, surrounded by double quotes. Thus 
"x = rO"; % defines x 

"rO + 1" = x; % loads x into rl 

Compile-time expressions are actually evaluated in a completely different context than run-time 
expressions, using the values of the locatives themselves as operands. Most operators have 
compile-time definitions analogous to their meanings at run time. Unless it contains an explicit 
assignment operator (= or —>), a compile-time expression is computed without changing the 
value of any identifiers. 

Compile-time binary operators will also be tabluated later on. 

9. Statements 

An expression, terminated by a semicolon, is one of the simplest statements permitted in 
LIL. An immediate or memory locative standing alone, followed by a semicolon, is treated as a 
request to generate one word of code whose value is the value of the locative. Character strings 
of any length may be used in this context to generate a series of code words; the last is padded 
with a null byte if the number of characters is odd. It is also permissible to write just a sem¬ 
icolon to specify a null statement. 

The simplest conditional statement is: 
if (test) statement; 

test may be any locative, or a conditional expression containing parentheses, the prefix not opera¬ 
tor *, the infix and operator &&, and/or the infix or operator II. Conditional expressions are 
evaluated left to right, following the same rules as for normal expressions, except that the right 
operand of && will not be evaluated when the left operand is false, and the right operand of II 
will be likewise skipped when the left operand is true. Moreover, && binds tighter than II for 
the sake of determining the truth value of the entire test. If a locative is not a condition, it is re¬ 
placed by that condition obtained by testing if the locative is nonzero. 

The not operator, of course, reverses the sense of any condition. Care must be taken, how¬ 
ever, to ensure that its operand is unmistakably a condition, for ' applied to an immediate mere¬ 
ly ones-complements its value. 

The if causes the controlled statement to be skipped if the test is not met, otherwise the 
statement is obeyed. To specify an alternate action when the test fails, write 
if (test) statement; else statement; 

The else part is skipped over if the first statement is executed. The first statement may be 
another if, in which case there is a possibile ambiguity in pairing subsequent else clauses. This 
ambiguity is arbitrarily removed by binding each else clause to the innermost ‘unelsed’ if. 

Loops are specified by 

while (test) statement; 
if the test should occur at the top, or by 

do statement; while (test) statement; 
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if the test should occur at the end'if the test should occur somewhere in the middle of the loop 
or at the end (second statement is null). Both forms exit when the test fails and contain an im¬ 
plicit branch back to the beginning from the end of the form. 

10. Groups 

Conditional and loop statements can be made much more powerful by having them control 
groups. A group is one or more statements in succession surrounded by braces: 
if (rO < 0) % if r0 is negative 

(rO =- rO; % negate ri) 

rl + 1; } % and count 

No semicolon is used after a group, nor is the semicolon left off the last statement in the group. 
Now the if will skip two statements if the test fails. Groups may, of course, contain other 
groups to any depth of nesting. 

A second form is the labelled group: 

label! statement; 
statement; 

} 

where label is an identifier. In addition to grouping statements for control purposes, this form 
also delimits a local region, in which identifiers may be redefined without clashing with other 
usage, and causes the label to be defined as type memory with the value of the location counter 
at the start of the group (although the definition actually occurs at the close of the group). 

Two control statements work in conjunction with loops or labeled groups. The forms 

break; 

continue; 

cause control to be transferred out of the innermost containing loop or labeled group, or back to 
its top, respectively. And the forms 

break label; 
continue label; 

perform similarly, except that all containing loops or groups whose labels do not match the label 
in the statement are ignored. Instead, control is transferred out of, or to the top of, the inner¬ 
most containing labeled group whose label matches. 

11. Function call and goto 

A C compatible function call is provided: 

funO; % no arguments 

fun(arg, arg, arg, ...); 

where arg is any locative or expression except a condition. Any argument expressions are 
evaluated left to right, then the arguments are moved onto the stack right to left. The function is 
called with a 

jsr pc,*$fun 

instruction and the arguments are popped off the stack. 

The latest edition of the flag variable .temp is consulted on each call with arguments to 
determine whether to use the top element of the stack Ispl to hold the rightmost argument. The 
flag is initially zero, indicating that Ispl is not to be used. 
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Each function call becomes a locative of type register and value zero. 

Control can be transferred to an arbitrary destination by 
goto loc; 

where loc is any locative that may be used with a jmp instruction (a br will be generated instead, 
wherever possible). It should not be necessary to use goto except in unusual circumstances. 

12. Declarations 

Identifiers are made implicitly local to the first region in which they are referenced. If, by 
the close of that region, no definition is provided, the identifier is promoted to the next contain¬ 
ing region. Identifiers undefined at the close of the source file are published as undefined exter¬ 
nals. To explicitly create local identifiers, use 
"local ident, ident, ident, 

where ident is an identifier, whose size may optionally be specified by writing 
ident sizeof loc 

for loc defined absolute immediate. Default size is two bytes. If these idents are not defined by 
the close of the current local region, a diagnostic is produced. 

Similarly, one can declare that certain identifiers are to be made known to other compila¬ 
tions, if defined, or are to be obtained from other object modules, if undefined, by writing 
"extern ident, ident, ident, 

An identifier explicitly declared local and not defined must also appear in an external declaration. 

A useful variant of the local statement is 
"locfident, ident, ident, 

For each ident, left to right, this 

1) declares ident to be local and defines it to have all the same attributes, except size, as loc. 

2) increments the value of loc by the size of ident (if loc is an identifier this constitutes a 

redefinition). 

The final size of loc is the sum of the sizes of the idents. 

13. Load Control 

LlL begins loading in the text section, but permits code to be generated also in the data sec¬ 
tion, by redefining dot, provided no attempt is made to back up over generated code. 

". = .data"; % switch to data area 

x{-l; } 
y{0; 0; 0; ) 

". = .text"; % switch back 

By the rules for labeled groups, however, such a temporary diversion may also be written as 
". = .data"{ 
x{—1; } 
y{0; 0; 0; } 

} 

provided x and y have been previously referenced or declared. 

Dot can be set to any absolute value, or to point anywhere in the bss section, so long as no 
attempt is made to generate code there. 



UNIX treats all undefined external references with nonzero size as labeled common blocks, 
and refuses to satisfy such references with text symbols, lil will set the size attribute to zero on 
any identifier used in a function call. Nonstandard entries^wichVe not defined in the current 
file, however, should be declared by v 

"extern entryO"; 

to set the size attribute of entry to zero and declare it external. 

14. Unary operators 

Unary operators have the same meaning at compile-time as at run-time, for none of them 
generate code (except —). Instead, they modify references to symbols or expressions, by changing 
their type, value, or other attributes. For convenience the following symbols are defined: 

! — non-byte memory reference. 

m, n — absolute (i.e. unrelocatable) immediate. 

r — register. 

v, w — any non-byte address type. 

x, y — any PDP-11 address type, including byte. 


The unary operators are: 

x-H- — converts type register indirect to type autoincrement. 

—x — converts type register indirect to type autodecrement. 

=n — compiles the absolute immediate or string n into the data section, appending a zero word 
if the last byte is not null. Result is type immediate, value is address of start of the allo¬ 
cated string. 

&x — converts x to type immediate, removes byte attribute, 
mem x — converts x to type memory, removes byte attribute. 

reg x — converts x to type register, removes byte attribute, x must be defined absolute with 0 
<* value <= 7. 
word x — removes byte attribute, 
byte x — attaches byte attribute. 

-n — negates absolute immediate n. 

~n — complements absolute immediate n. 

rts r — converts type register r to an immediate with value of rts r instruction, 
jsr r — converts register r to an immediate with value of jsr r,*$ instruction. (Should be fol¬ 
lowed by address of routine.) 

sys n — converts immediate n to an immediate with value of sys n instruction, 
sizeof x — converts x reference to an absolute immediate with value equal to size attribute of x 
reference. 

Ill n — shifts absolute immediate n left 8. 
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15. Compile-time binary operators 

x>=y, x>y, x==y, x<y, x<=y, x"=y — result is a condition with value true if the relation 
obtains, otherwise false. Both x and y must have the same relocation bias. Compari¬ 
sons are all on signed quantities. 

x=y, y->x — x is defined to have the same type, value, and other attributes as the y reference. 

x must be previously undefined (unless it is dot or one of the four location counters). 
x=—n — x is defined as the negative of the absolute symbol n. 

— x is defined as the complement of the absolute symbol n. 

The type of x is left unchanged in the following: 
x+y — result is the sum of x and y. At most one of the two may be relocatable or undefined, 
x-y - result is the difference of x and y. If y is relocatable or undefined, it must have the same 
bias as x. 

n'm — result is n equivalence m (not exclusive or). 
n”m — result is n exclusive or m (not equivalence), 
nlm — inclusive or. 
n&m — and. 

n&'m — same as n&Cm). 

n**m - n is shifted left (if m>0) or right (if m<0) by Iml. 
n*m — multiplication, 
n/m — division. 

nlllm — same as (n&0377) I (m*'"8) [packs bytes], 
x sizeof n — result is x with size n. 

16. Run-time binary operators 

If x and y are not both byte or both word, then one of them must be a non-byte register or 
immediate. The instruction generated will then be byte mode if either operand is byte. The fol¬ 
lowing is a metalinguistic description of the decisions made by LlL in producing code for each 
operator. 

x>=y, x>y, x==y, x<y, x<=y, x~=y [compare signedl 
x>>=y, x>>y, x<<y, x<<=“y [compare unsigned! 

if (y==0 && cc set on x) do nothing 
else if (y==0) tst(b) x 

else if (x==0) tst(b) y 

else cmp(b) x,y 

x=y, 

y->x if (y==0) clrtb) x 

else if (x=={carry oflow zero minus] && y=={true false}) 
setx or secx 

else if (y==minus) sxt x 
else mov(b) y,x 

x=~y if (y==0) clrtb) x 

else if (x==y) neg(b) x 

x='y if (x=={carrv oflow zero minus] && y=={true false}) 
secx or setx 
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else if (x==y) 

com(b) x 

x+y if (y==*l) 

else if (y==carry) 
else 

inc(b) x 
adc(b) x 
add y,x 

x-y if (y—=1) 

else if (y=*carry) 
else 

dec(b) x 
sbdb) x 
sub y,x 

xly 

bis(b) y,% 

x&n 

bic(b) $!n,x 

x&'y 

bic<b) y.x 

x”r 

xor r,x 

r*y 

mul y,r 

r/y 

div y,r 

x?y if (y==0) 

else 

tsKb) x 
cmp(b) x,y 

x?&y 

birth) x,y 

xOn if (n*“l) 

else if (n=* —1) 

rol(b) x 
roKb) x 

x<*>n if (n==8) 

else if (x odd reg) 

swab x 
ashc n,x 

x**n if (n--l) 

else if (n*« —1) 
else if (x a register) 

asl(b) x 
asrib) x 
ash n,x 

r***n ashc n,r 


Both the sizeof and III operators are also defined at run-time and have the same effect as at 
compile-time. They do not directly cause code to be generated. 

The compiler is aware of what is happening to the condition code most of the time. It 
knows, for instance, that swab and function calls do not leave the code in a state implied by the 
language, and so will generate a tst if the result of either is to be used in a test. The PDP-11 is 
somewhat whimsical about the setting of the carry bit, however, and LIL makes no real attempt 
to second guess the machine. (It makes a difference whether you add 1 or 2 to something, for 
instance.) It is always a good idea to be very careful when testing for special conditions. 
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17. Conditional Compilation 

If the test in any conditional or loop statement is unconditionally true or false, no code is 
compiled for the test. If the test is false, in fact, no code is compiled for the statement con¬ 
trolled by the test (the branch back to the top of a loop is also omitted). In the if-else form, one 
and only one of the two statements is compiled. Compile-time parameters, in conjunction with 
compile-time relational expressions, can thus be used to cause selective generation of code. 

18. Pedigree 

LIL descends from an implementation language for the GTE TEMPO-I, written by P.D. Jensen 
and A.G. Fraser. Compile-time notation and the rigorous left-to-right expression evaluation are 
carryovers. The current syntax is heavily influenced by that of the language c, designed by D.M. 
Ritchie; the preprocessor and invocation control are a direct steal. 

All compilation decisions are made in one pass of a simple syntax-directed translator, writ¬ 
ten by the author. Control tables for the translator were produced by Steve Johnson’s compiler- 
compiler YACC, working from a 50-production grammar. Doug Bayer wrote the last pass, which 
assembles code and tables into standard UNIX object format. 
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